Building a Chinese discourse topic corpus with a micro-topic scheme based on theme-rheme theory

نویسندگان

  • Xue-feng Xi
  • Guodong Zhou
چکیده

*Correspondence: [email protected] 2School of Computer Science and Technology, Soochow University, ShiZi Road, Suzhou, China Full list of author information is available at the end of the article Abstract Background: How to build a suitable discourse topic structure is an important issue in discourse topic analysis, which is the core of natural language understanding. Not only is it the key basic unit to implement automatic computing, but also the key to realize the transformation from unstructured data to structured data during the process of big data analytics. Although the discourse topic structure has wide potential for application in discourse analysis and related tasks, the research on constructing such discourse resources is quite limited in Chinese language. In this paper, we propose a micro-topic scheme (MTS) to represent the discourse topic structure in the Chinese language according to theme-rheme theory, with elementary discourse topic unit(EDTU) as the node and referent of theme-rheme as link. In particular, thematic progression is employed to directly represent the development of the discourse topic structure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Micro-topic Model for Coreference Resolution Based on Theme-Rheme Structure

Coreference resolution is a major task of natural language processing. Although the mention-pair model is one of the most influential learning-based coreference models, it is hard to make any further improvements of the performance because of its inherent defects. From the perspective of discourse analysis, a micro-topic model based on the theme-rheme structure is proposed for coreference resol...

متن کامل

A Contrastive Analysis of Thematic Progression Patterns of English and Chinese Consecutive Interpretation Texts

As a grammatical device, Theme-Rheme structure plays an important role in organizing and analyzing discourse. Theme Progression patterns (hereinafter called TP patterns) are significant method of the analysis of textual cohesion and organization. Based on linguist Danes’ theory about TP patterns, this paper tries to present a contrastive analysis of the texts of consecutive interpretation in En...

متن کامل

Dual function of first position nominal groups in research article titles: Describing methods and structuring summary

Previous research has identified the nominal group as the most distinctive feature of the research article title. In contrast, the findings reported in this paper suggest Theme/Rheme is the dominant structure in title text. Theme/Rheme structures order and tie nominal groups in titles. When a title starts with a methodological term the first position nominal group acts as a theme marker. Thus, ...

متن کامل

Topic-Based Bengali Opinion Summarization

In this paper the development of an opinion summarization system that works on Bengali News corpus has been described. The system identifies the sentiment information in each document, aggregates them and represents the summary information in text. The present sys-tem follows a topic-sentiment model for sentiment identification and aggregation. Topic-sentiment model is designed as discourse lev...

متن کامل

The MULI Project: Annotation and Analysis of Information Structure in German and English

The goal of the MULI (MUltiLingual Information structure) project is to empirically analyse information structure in German and English newspaper texts. In contrast to other projects in which information structure is annotated and investigated (e.g. in the Prague Dependency Treebank, which mirrors the basic information about the topic-focus articulation of the sentence), we do not annotate theo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017